Iteration: Loops and Purrr

E. Nordmoe

21. Iteration: Loops and purrr

E. Nordmoe

Outline

Tools for Reducing Code Duplication

Iteration

A Basic Loop

output <- vector("double", ncol(df))  # 1. output
for (i in seq_along(df)) {            # 2. sequence
  output[[i]] <- median(df[[i]])      # 3. body
}
output

Components

Try on mtcars

Generalize to use a loop to create a function

col_mean <- function(df) {
  output <- vector("double", length(df))
  for (i in seq_along(df)) {
    output[i] <- mean(df[[i]])
  }
  output
}
col_mean(mtcars)

Try on mtcars

for loop variations

  1. Modifying an existing object, instead of creating a new object.
  2. Looping over names or values, instead of indices.
  3. Handling outputs of unknown length
  4. Handling sequences of unknown length

==> See R4DS for more details!

Modify Data Frame: Standardize

Step 1: Create the Function

standardize <- function(x) {
xbar <- mean(x, na.rm = TRUE)
sdx <- sd(x,na.rm = TRUE)
(x - xbar) / sdx 
}

Modify Data Frame: Standardize (cont’d)

Step 2: Apply the function by looping

for (i in seq_along(mtcars)) { 
  mtcars[[i]] <- standardize(mtcars[[i]])
}
head(mtcars)

Using for loops to do functional programming

col_summary <- function(df, fun) {
  out <- vector("double", length(df))
  for (i in seq_along(df)) {
    out[[i]] <- fun(df[[i]])
  }
  out
}
col_summary(df, median)
col_summary(df, mean)

Try on mtcars

Use purrr to simplify

  • Map functions loop over a vector (data frame), do something to each element, and save the results
  • Specific map function determines output:
    • map() ==> list
    • map_lgl() ==> logical vector
    • map_dbl() ==> double vector
    • and so on

Try these on mtcars

map(df, mean)
map_dbl(df, mean)
# Or even this? 
map_chr(df, mean)
# Or this, surely not
map_int(df, mean)

Recall Scraping Script

  • Uses map_df to create a data frame by binding rows
map_df(covid_speech_urls, scrape_speech)

Mapping over two arguments: map2()

# mu and sigma are the arguments  
# rnorm is the function
mu = list(5, 10, -3)
sigma = list(1, 5, 10)
map2(mu, sigma, rnorm, n = 5) 

Mapping over multiple arguments: pmap()

pmap(list(mu,sigma), rnorm, n = 5)
# or more generally  
n = list(1, 3, 5)
# name arguments for stability
pmap(list(mean = mu, sd = sigma, n = n), rnorm)

Study the purrr Cheat Sheet

https://posit.co/wp-content/uploads/2022/10/purrr.pdf

Advanced functional programming!

fun <- function(f) pmap(list(x = mtcars, na.rm = TRUE), f)
param <- list(list(mean), list(median), list(sd))

#was
invoke_map(.f = fun, .x = param)
#now
map(param, fun)
  1. At the invoke_map() level, fun takes as arguments param, which are the functions we want to apply to mtcars.
  2. Next, at the fun level, these functions stored in param are applied by pmap(), one at a time, to each column in mtcars.

Try these on Pokemon data

More information

About R markdown: http://rmarkdown.rstudio.com

About shower: https://github.com/shower/shower

Example shower presentation: http://shwr.me/